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Abstract 

We examine an 11-year data set from the tropical weather station of Tlaxcala, 
Mexico. We find that mutual information drops quickly with the delay, to a 
positive value which relaxes to zero with a time scale of 20 days. We also 
examine the mutual dependence of the observables and conclude that the data 
set gives the equivalent of 8 variables per day, known to a precision of 2%. We 
determine the effective dimension of the attractor to be -D e // ~ H-7 at the scale 
3.5% < R/Rm ax < 8%. We find evidence that the effective dimension increases 
as RjRraax — > 0, supporting a conjecture by Lorenz that the climate system may 
consist of a large number of weakly coupled subsystems, some of which have low- 
dimensional attractors. We perform a local reconstruction of the dynamics in 
phase space; the short-term predictability is modest and agrees with theoretical 
estimates. Useful skill in predictions of 10-day rainfall accumulation anomalies 
reflects the persistence of weather patterns, which follow the 20-day decay rate 
of the mutual information. 
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1. Introduction 

In temperate regions, the large scale approach to meteorology based on 
the reduction of fluid equations to a finite grid has been effective in meeting the 
demands of agricultural planning and disaster prevention. This is not the case 
in tropical regions, where the atmospheric system is highly sensitive to local 
"details", such as the presence of a hill or a lake, which are too subtle to be 
accurately modeled on a grid system. Often it is not even known which details 
are important and which are not. One is tempted to turn to the empirical 
approach: to reconstruct the dynamical model not from physical equations, but 
from data drawn from observations of the system. This approach has been 
applied successfully to scalar time series analysis (Linsay 1991, Abarbanel et 
al. 1990, Farmer and Sidorowich 1987), leading to useful predictions for chaotic 
systems for which the underlying equations are not known a priori. Before 
applying this method to meteorology, one must gain a better understanding of 
tropical atmospheric dynamics from the point of view of chaos theory. That is 
purpose of this article. 

The study of chaos in extended systems, such as atmospheric dynamics, 
is attracting a growing interest as the next challenge in dynamical systems 
analysis. A key question which must be addressed is whether such systems 
can have low-dimensional attractors, so that one could hope to reconstruct the 
dynamics in phase space from observations of the system at regular intervals of 
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time. 

Most attempts at calculating an attractor dimension from weather data 
have encountered the difficulty that the data sets are too scarce to trust that the 
dimension found is really the correlation dimension of the attractor. Rather, the 
scaling graphs generally provide information only on the large scale structure 
of the attractor, and there is no compelling reason to expect that this structure 
should be similar at finer scales. Instead, Lorenz has conjectured that the global 
climate may be well represented by a large number of weakly coupled subsys- 
tems, some of which may have low-dimensional attractors [Lorenz 1991]. If the 
coupling is sufficiently weak, the external perturbations would not be detected 
in the large-scale reconstruction of the attractor, but would imply a much larger 
attractor dimension at smaller scales, reflecting the greater complexity of the 
coupled dynamical system. Lorenz based his conclusion on a study of a model 
system, which consisted of a number of identical copies of the set of three modes 
of the Oberbeck-Boussinesq equations for the convection of a fluid heated from 
below [Lorenz 1963], with coupling terms connecting the different copies. 

In this article, we will consider Lorenz' conjecture in the practical set- 
ting of a tropical weather data set, and attempt to decide to what extent a 
reconstruction of the large-scale structure of the attractor can be helpful for 
the purpose of designing prediction models for extended chaotic systems. 

The data set consists of 19 weather variables measured daily at a single 
weather station, located near the town of Tlaxcala, Mexico. The measurements 
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were performed without interruption or change of equipment over a span of 4015 
days (11 years). Holes and flagrant errors in the data set amounted to less than 
0.5% of the data and were replaced by linear interpolation. The decision to use 
19 variables at a single location, rather than fewer variables at several locations, 
was based on evidence that the data had been compiled most rigorously at the 
central weather station. Of course the 19 variables are not all independent; we 
shall discuss this in more detail in the next section where we show that the data 
set carries as much information as 8 mutually independent variables, or a single 
observable over a time span eight times as long. 

The 19 variables are listed in Table 1. The range of each variable (= 
max. —min.) was chosen to equal 2 b ^ or a decimal fraction thereof, where b(i) 
is the number of significant bits according to our estimate of the measurement 
accuracy. It was then normalized to the unit interval: x£ £ [0, 1], i = 1, N, 
k = (N = 19, n = 4015) and the annual cycle was subtracted. With 

this normalization, the greatest possible distance between two points is equal 
to one: we will denote the normalized distance by R/R max to recall that it is 
less than or equal to one (see the precise definition of the distance function in 
Sec. 3). The dataset is not available on any public-domain databank: please 
contact the author directly for electronic transfers. 

In Sec. 2, we examine the information content of the data set with an 
adaptation of the mutual information method for short data sets. In Sec. 3, 
we show that the data reflects a deterministic chaotic system down to the scale 
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R/Rmax = 3.5%, and determine the effective dimension of the attractor in 
the range 3.5% < R/R max < 8%. A strong increase in the slope just below 
this range, but above the scale where the shortcomings of the data set become 
noticeable, provides evidence in support of Lorenz' conjecture on the nature of 
the climate system. The theoretical predictability limit of the data set is also 
estimated, by two different methods. In Sec. 4, we describe the results of a 
local reconstruction of the dynamics and conclude, based on those results and 
the previous data analysis, on how the large scale structure of the attractor can 
be used in prediction tasks. 
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2 Mutual Information Analysis 



The mutual information between two random variables (j ^ i), 

with statistics specified by a finite set of events with a determined measurement 
error, is defined as follows. One divides the range of the variables into a fixed 
number of intervals ( "bins" ) , such that the size of each bin is greater than the 
error and each allowed bin is visited a statistically significant number of times 
(> 40). For two weather variables with a delay of r days, the data set gives us 
n — r events (x[ % \ a^+V; t = 1, ...,n — r); a division into 64 bins was found to 
satisfy both criteria for all variables but Nos. 9, 10 and 12 (cloud cover, average 
and dominant wind directions). 

The probability distribution of a variable is given by the fraction of the 
events which fall into each bin: the "probability of the bin O a n is 

„ = #{,<■> (, = 1, ■..,„) :»<" 6 P.} 

v ' n 

where the index a labels the bins and # is the cardinality, or number of elements 
in the set. 

The average information needed to specify the variable x^\ knowing its 
probability distribution, is given by 



7( X W) = - < log 2 P(xM) >, 
7 



2.2 



where the average is a sum over a weighted by the probability P(x^ 1 ' G O a ). 
One easily checks that the information is maximal when P(x^) is homogeneous 
and I(x^) is equal to the number of bits needed to count the bins: for 64 bins, 
the maximum information is equal to 6 bits (Table 1). In general, the more 
P(x^) is peaked around a particular value of x^ l \ the less information one 
needs, on average, to specify the value of x^>: I(x^) measures "how surprised 
one feels when, knowing the distribution P(x^), one is told a typical value of 
the variable" (Fraser and Swinney, 1983; Fraser, 1988). 

The two-point probability P(x^ G O a , x^' G Op) is given by the fraction 
of events where x^ falls into the bin O a and x^> falls into the bin Op. The 
information needed to specify two variables is given by 

I(x {l \ x (j) ) = -< log 2 P(x {l \x (j) ) >, 2.3 

where the average is now a sum over a and f3 weighted by P{x^ l \ x^). 

The information needed to specify two correlated variables jointly is less 
than that which is needed to specify each of the variables independently; the 
difference measures the degree of mutual dependence of the two variables and is 
called "mutual information". With the notation P % > = I(x^) : etc., the mutual 
information is given by 

= jii) + _ > _ 2.4 
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If one is concerned with the mutual information for two measurements of 
the same variable with a delay r, the relevant events are {x t , xft T }. Since 
I(xq) = I(xr ) = the mutual information is given by 

7<?(r) = 2/« _/W0)-*W). 2.5 

Likewise, the mutual information between the variable and the delayed 

(?) 

variable x^ T is given by 

(r) = + - /( J (°)^( T )). 2.6 

For a chaotic system, I$(t) — > and {t) — > as r — > oo, Vz, j. 

The mutual information is considered to be the best measure of mutual 
dependence of variables, for non-linear systems (Fraser, 1989). 

Unfortunately, the implementation of this method for short data sets en- 
counters the difficulty that the two-point distribution P{x^\x^') is not accu- 
rately defined, since the number of events which fall into each allowed bin is 
not statistically significant (insufficient sampling). The probability distribution 
comes out rough, which implies that probability is "localized" at statistically 
insignificant peaks in P{x^ l \x^'). Therefore the two-point information is un- 
derestimated and the mutual information is overestimated. This problem can be 
corrected by smoothing the distribution P{x^ l \x^>). We used the discretized 
diffusion equation 
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which we iterate four times (k = 1, ...,4), so that the final value of P a p is a 
weighted average of P over a square of 8 x 8 = 64 neighbouring bins, with 
the strongest weights assigned to the bins closest to (a(3). The probability 
distribution gradient is set to zero on the boundary, where a or (3 = or 65. 
The smoothing parameter e is chosen independently for each pair (i, j) so that 
Im^ij) — > as t — > oo, and for each i, so that Im(r) — > 0. To implement the 
r — > oo limit, we required that the average mutual information over the range 
t = 150, 250 be equal to zero; the r.m.s. of I m over this set of 100 trials gives 
one a measure of the degree of precision of the mutual information estimates: 
an upper bound e < 0.7 was set on the smoothing parameter, and the cases 
where the r.m.s. was greater than 0.05 bits were considered not trustworthy 
due to the short data set. 

The mutual information among the 19 variables is given in Table 2. One 
notes that in most cases mutual informations are low, between and 0.2 bits, 
with a few notable exceptions: for example, the minimum dew point is strongly 
dependent on the minimum registered value of the vapor pressure, and likewise 
for the corresponding maximum values. This is not surprising, given that these 
variables are closely related and reach their extrema more or less simultaneously. 
Also the dry bulb temperature is dependent on the minimum temperature, cloud 
cover, minimum vapor pressure and minimum dew point. The total information 
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at one time step is given by the sum of the informations of the 19 variables 
(diagonal terms in Table 2), minus the sum of the mutual informations Im for 
i < j, or 70.4 - 23.6 = 46.8 bits, roughly equivalent to 8 variables of 6 bits each. 
In this counting, one is neglecting the n — point mutual information, n > 3. 

The mutual information of each variable as a function of the delay is given 
in Table 3. The mutual information drops to about 10% after a delay of one 
day: this is sufficiently small to justify using successive patterns as independent 
axes in a phase space reconstruction, and the total amount of independent data 
available for the reconstruction is 8 x 4015 = 32, 120. 

Some variables have a strong persistence, which is revealed by a relatively 
large mutual information at delays r ~ 15, particularly the dew point and the 
vapor pressure, and the dry bulb temperature and the minimum temperature: 
this indicates a moderate potential for the application of statistical prediction 
models such as the linear autorregressive models or the low-threshold linear 
reconstruction model described in Sees. 3 and 4. The slow decline of mutual 
information with the delay appears to indicate a strong persistence of some 
features of weather patterns, which would be well exploited in such a model. 
The rainfall shows very little persistence but has significant mutual dependence 
with other more persistent variables. This indicates some potential for medium- 
range predictions which would exploit the predictability of these other variables 
to give trend predictions for precipitation anomalies (Sec. 4). This type of 
predictability through the persistence of another variable could be quantified 
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as mutual information by computing the three-point mutual informations. 

The total mutual information as a function of the delay gives one a measure 
of how much additional information one gets by adding one day to a set of T 
consecutive days of data. A complete description of the state of the weather 
system should be such that no new information is gained by adding more data. 
Recalling that a single day of data consists of 46.8 bits of information, the 
information added by introducing a second day is 46.8 - 6.6 = 40.2 bits; the 
third day adds 46.8 - 6.6 - 4.5 = 35.7 bits, etc.; the information added per day 
is represented in Figure 1. The fact that no additional information is gained 
after 14 days of data indicates that the state of the system is well-described if 
one gives the value of these variables for 14 consecutive days. This figure is an 
overestimate: if one could compute the n— point mutual informations, the total 
mutual informations would be larger and the number of days beyond which no 
new net information is gained would be less than 14. We will see below that 
the embedding dimension, computed from the scaling graphs, corresponds to 
7 consecutive days of data, and that the local reconstruction of the dynamics 
gives optimal predictions for T e = 10 — 14. The mutual consistency of these 
results is encouraging. 
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3. Phase Space Reconstruction 

The phase space reconstruction is based on the following idea. One as- 
sumes that the system can be modeled by d first-order differential equations; we 
will see below how one verifies the validity of this assumption and determines 
d. Any set of d independent measurements of the state of the system provides 
a complete set of initial data which determines the trajectory uniquely. In the 
phase space reconstruction method, one begins by choosing d observables to 
provide a convenient representation of the state of the system in IR d . These can 
be measurements of a single physical variable at regular intervals of time, as in 
the method of delays, or measurements of d different variables when another 
variable equals a predetermined value (Poincare section) . Here we will take the 
19 available variables for T consecutive days of data (d = 19 x T). Some of them 
are minimum or maximum values, others are averages over a 24-hour period; all 
are well-defined functions of the continuous-time physical variables over non- 
overlapping intervals of time. Following the assumption that the dynamics is 
determined by first-order differential equations, these induce recursion relations 
for the evolution of the observables from one 24-hour period to the next. The 
phase space reconstruction proposes (1) to examine the geometry of the recon- 
structed attractor (fractal dimension, etc.) and (2) to find an approximation of 
these recursion relations. 

We define a point in phase space as a set of 19 x T variables corresponding 
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to T consecutive days of observations. The data set then gives one 4015 — T 
points with which we shall attempt to reconstruct the attractor in the embed- 
ding phase space: ~x k= {%k-t '■ * = 1> •••> -^"j t = 0, T — 1}. We will use the 
arrowed vector notation for phase space points, and bold-face letters for a single 
day of observations: e JR N . The distance between two points in phase space 
is taken to be the normalised Manhattan distance 

^ N T-l 

d(x k , xi) = — Yl a W I x k-t - x i-t I 3- 1 

i=l t=0 

where a(t) is a partition of the identity for t = 0,1, ...,T — 1, which in this 
section will be taken to be uniform: a(t) = 1/T, Vt. Since the range of each 
variable x^ was normalised, one easily checks that d(~Xk, ~xi) < 1, Vfc, /. 

To determine the correlation dimension, one computes the number of pairs 
of points with distance d(xk, x{) < p: N(p) (Grassberger and Procaccia, 1983). 
If T = T e is chosen so that IR e embeds the attractor of dimension D a , 
then N(p) ~ p Da . If T is too small for a proper embedding, T < T e , then 
iV(p) ~ p D ^ T \ where -D(T) is the dimension of the attractor projected onto 
the first T x N coordinates. The dimensions D(T) are determined from the 
slopes of logarithmic plots of N(p), by linear regression (Figure 2). If there 
is a deterministic dynamical rule, the graph of D(T) reaches an upper bound 
D = D a when T = T e . There are several obstacles which can inhibit a linear 
"scaling region" in the graph of log(N) vs. log(p). 
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1. Noise. A stochastic signal superimposed on the deterministic signal can 
disperse the points along directions orthogonal to the D a - dimensional attractor; 
this is evidenced by a steeper slope of the logarithmic graph. In the case of our 
data set, measurement error can be estimated to be less than 2% of the range of 
each variable; if we assume that the signed error is equally distributed on both 
sides of the origin, the measurement error in the Manhattan distance between 
points in R 7VxT is of the order of (Feller 1968) 



2 

e m ~ = %. 3.2 

VTxN 

For systematic measurement errors the random-sign argument is not valid and 
one must consider the upper bound on the error, e m < 2%. 

2. Lack of data. For a short data set, at small p the total number of pairs of 
points with distance less than p becomes too small to be considered statistically 
significant and one should require that 



N(p) > 50. 3.3 

In our case, the graphs of N(p) reveal that this places a lower bound p 3 — 4%, 
depending on T, below which the points of the graphs scatter randomly due to 
the bad statistics. Since this lower bound is relatively large, one cannot safely 
assume that, for an infinitely long data set, the scaling region would continue 
with the same slope all the way down to p — > 0. This problem is frequently 
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encountered in climate and weather data sets (see Grassberger 1986, Lorenz 
1991, and references therein) and implies that the correlation dimension, defined 
as the effective dimension in the limit p — > 0, cannot be computed. One should 
then consider the dimension estimates as effective dimensions at a certain scale. 
As Lorenz has suggested, it is possible that this effective dimension be a good 
approximation of the correlation dimension of a simpler local sub-system which 
is more or less decoupled from the global climate system: at lower values of p, 
the coupling with external variables would become relevant, leading to a much 
higher correlation dimension. We will describe evidence in support of Lorenz' 
conjecture below. 

3. Edge effect. For large values of p, comparable to the size of the attractor, the 
number of pairs saturates when a sphere of radius p centered at a typical point 
on the attractor extends beyond the edge of the attractor, in an empty region of 
phase space: the maximum possible number of pairs taken from n data points 
is n 2 . The scaling region therefore ends below this saturation point, and 



N(p) « n 2 . 3.4 

To see exactly where saturation becomes significant, one can only draw the 
complete graph of N(p) and observe at which point saturation occurs. From 
Figures 2-4, one sees that saturation becomes significant for N(p) > n 2 /100. 

4. Scale-dependent dimension. Some attractors have different apparent dimen- 
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sions at different scales; in some cases even the small-scale limit of the effective 
dimension is not well-defined and the correlation dimension does not exist, as 
for example with Zaslavskii's map, where D a alternates between D a xs 1 and 
D a ks 1.6 in the limit p — > (Grassberger and Procaccia, 1983). If the effective 
dimension varies with p in the region between the lower bounds (subsections 1 
and 2) and the upper bound (s.s. 3), it may be impossible to find a physically 
significant scaling region. 

In our case, a clear scaling region is found up to T = 10 (Figure 3) and to a 
lesser extent at T = 14 (Figure 4) . The scaling regions are taken to begin above 
N(p)/n = 0.01 (N(p) ?y 40) and extend to larger p with the limitation that 
the linear regression coefficient be greater than or equal to 0.999. The slopes 
of the scaling regions (Figure 5) give D(2) = 8.5, D(4) = 9.7, D{7) = 11.12, 
-D(IO) = 11.67 and -D(14) = 11.68, indicating an effective attractor dimension 
D eff = 11.7 at the scale 0.04 < p < 0.08. 

At larger scales (0.08 < p < 0.12), one notes a slight increase in the 
effective dimension, to -D e // ~ 14, although the scaling region is not sufficient 
to give an accurate estimate of the slope. This contrasts with the decrease in 
the slope with saturation, at p > 0.12. 

At the other end of the correlation graph, the first few points for which we 
can consider that the statistics is satisfactory suggest a steeper slope, particu- 
larly for larger values of T (Figures 4, 6); this hints that the effective dimension 
of the dynamics at small scales may be greater than in the scaling region, per- 
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haps due to a weak coupling to external variables as suggested by Lorenz' work 
(Lorenz, 1991). Let us assume that this external coupling can be approximated 
by a random variable with variance a 2 and mean zero, added on to each phase 
space variable. The average squared distance between two points in phase space 
would then be p 2 = p' 2 + 197V 2 , where p' is the average without noise. For ex- 
ample, one can determine from Figures 2-4 the average distance P200 such that 
there are -/V(p 2 oo) = 200 pairs of data points with distance less than /9200- From 
N(p2oo) w 0.04 at T = 7 and p 200 w 0.05 at T = 14 one finds 19a 2 « 1.3 x 10" 4 . 
This allows one to compute P200 ~ 0.065 at T = 29, in agreement with Fig- 
ure 6. Although this is by no means a formal argument, the accuracy of the 
prediction for T = 29 indicates that one may be well justified to interpret the 
correlation graphs as follows: the correlation graphs indicate the existence of a 
deterministic system with attractor dimension D a = 11.7 coupled to external 
perturbations, which can be reasonably well modeled as a random term added 
on to each variable, with 1.3 x 10 4 and mean zero. 

If the deterministic signal is a representation of the local atmospheric 
dynamics and the random part is a representation of external perturbations, 
then one might think that a fully deterministic model could be obtained if 
one included in the phase space the so-called "external" variables, perhaps by 
including measurements of other variables besides the N = 19 variables of the 
data set. A lower bound on the dimension of the attractor of the coupled 
system can be derived from the slope of the correlation graph at low values of 
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p; this would give an effective dimension in excess of -D e // ~ 35 for p < 0.04 
(T = 7) or p < 0.05 (T = 14). Such a large attractor dimension indicates 
that a reconstruction of the phase space would require a very large number of 
data points, corresponding to daily observations over at least 10 35 / 11 7 years 
just to reconstruct the attractor at the same very low resolution with which we 
reconstructed the 11.7-dimensional attractor. The approach suggested in the 
previous paragraph, to treat the perturbations as random noise, stands out as 
more realistic. 

Whether or not the speculations of the previous two paragraphs stand 
up to future scrutiny, one can surely not assume that the effective dimension 
D e ff ~ 11.7 is valid at all scales, or that it is a valid estimate of the correlation 
dimension of the attractor, defined as the effective dimension in the limit R — > 0. 

At the scale 0.04 < p < 0.08, the effective dimension -D e // ~ 11.7 appears 
to be trustworthy: each point in the scaling region is based on a large number 
of pairs, so that N(p)/n 2 is insensitive to the size of the data set (this has 
been checked explicitly by further reducing the data set), and is well below the 
saturation, as is evidenced by the fact that the slope does not drop off until much 
larger values of N(p). Some numerical experiments lend further confidence to 
the effective dimension result: 

1. The sensitivity to the choice of delay was checked by averaging the data by 
sets of three days and thereby reducing the length of the data set to n/3 days; 
the same attractor dimension was recovered. 
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2. The sensitivity to the choice of variables was examined by considering only 
the first 6 variables for each day, instead of 19. Again, the same attractor 
dimension was found. A further reduction, to the first 2 variables, led to an 
underestimation of the effective dimension, D e ff ~ 11.1, which indicates that 
these two variables were not perfectly coupled to the system as a whole. Al- 
though this result may surprise strong believers in the so-called "Takens rule" , 
following which any one variable should reflect the dynamics of the entire sys- 
tem through its interactions with all others, in practice if the interaction is too 
weak then they are detected only in the fine structure of the attractor, below 
the scale p = 0.04. This implies that if one is going to use a single variable 
in the correlation method, then it is important to choose one which is strongly 
coupled to the others, as noted by Lorenz (Lorenz 1991). 

3. Finally, we performed the standard check of generating an artificial data 
set with the same statistics but no deterministic dynamics, and found that 
the upper bound D(T) < D e ff disappears. We generated such a data set 
by randomly mixing the n = 4015 patterns, and found that the graph D(T) 
increases monotonously, passing D = 34 at T = 14. 

The largest Liapunov exponents can be estimated by considering the aver- 
age rate of divergence of nearby trajectories (Briggs 1990, Gade and Amritkar 
1990, Ellner et al. 1991, Galbraith 1992, Wales 1991, Zeng, Eykholt and Pielke 
1991). For each available data point in the phase space, o^G M TeXN , the near- 
est neighbour is identified and its evolution is compared to that of the original 
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pattern, step by step. Here we excluded near-neighbors that are also neighbors 
in time (measurements within 20 days of each other) since their trajectories do 
not diverge. The error at each time step is averaged over the patterns. The 
error grows rapidly with a time scale roughly equal to 2 days, then reaches 
a slowly rising upper bound which is significantly lower than the average dis- 
tance between randomly chosen points (Figure 7). The upper bound indicates 
a potential for longer-term predictability based on the persistence of weather 
patterns. The doubling of the error in two days would indicate a Liapunov 
exponent Ai ~ |/n(2), although the data set is clearly insufficient to consider 
this a serious estimate. 

A quantitative estimate of the theoretical predictability limit can be ob- 
tained as a byproduct of the correlation method (Fraedrich 1987, Fraedrich and 
Leslie 1989, Nese 1989). 



_ ln(N(p;T))-ln(N(p;T + m)) 

-'-pre • O.O 

m 

From the slope of the graph of N(p; T) as a function of T, with p = 0.05, 
one finds T pre = 1.83 days (Figure 8). Another estimate is given by the slope 
of the graph D(T) in the linear region previous to the saturation at D{T e ) = 
D e ff. A linear regression (Figure 9) gives T pre = 2.1 by this criterion. The 
similarity between these estimates, and the consistence with the divergence 
rates discussed in the previous paragraph, hints that they may be valid in spite 
of the very short data set. The prediction error at time T would then be 
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e(T) ps eo e T / Tpre for small T, where eo is the scale at which the reconstruction 
of the attractor is meaningful, e ~ 4%. Note that the predictability limit falls 
well short of the 2-3 week limit of large scale weather patterns studied in General 
Circulation Models (GCM's). This confirms the claim made in the introduction, 
that the high sensitivity of local atmospheric dynamics in the tropics invalidates 
the grid approach of the GCM's for aspects of the weather that are sensitive 
to the local variables, such as tropical precipitation. Grid-reductions of the 
fluid equations make sense only if the system is sufficiently stable that the 
inevitable simplifications in the equations and the grid reduction itself do not 
lead to a numerically unstable code. For highly unstable systems, the attractor 
reconstruction method is an obvious candidate to guarantee numerical stability 
without over-simplifying the dynamics. Unfortunately the high dimensionality 
of the attractor severely limits the accuracy of the phase space reconstruction 
method in this particular example, as we shall see shortly. 
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4. Local Reconstruction of the Dynamical Map 



In the previous section, we saw that the state of the weather system can be 
described by a set of T e m 10 consecutive days of measurements, corresponding 
to one point in the phase space IR e . Here, we will be concerned with the 
reconstruction of the dynamics (Farmer and Sidorowich 1987, 1988, Abarbanel 
et al. 1990, Giona, Lentini and Cimagalli 1991, Gouesbet 1991, Linsay 1991, 
Tsonis 1992, Eisner and Tsonis 1992). Given an initial point xq in this phase 
space, we wish to compute its orbit. The dynamical evolution integrated over 
one time step is a map f : TR TeXN — > M TeXN ; we will denote by fx the local 
reconstruction of this map in the vicinity of xq. A generalization of the method 
of analogues will be used (Lorenz 1963, 1969, van den Dool 1989, Toth 1989): 
the evolution of the initial point x o will be based on the data points x k closest 
to io and their known evolution one day ahead, x/j+i = f(xk) 

We will use the distance (3.1), with weights a(t) decreasing linearly with 
the delay so that a(T e — 1) = a(0)/2. 



a(t) - 4(Te ~ 1} ~ 2t 41 
a[t) 3T e (T e - 1) 



We will consider the data points Xk such that d(xk, %o) < where rj is 
the radius of the ball around xq where we wish to determine the local map fz,. 
The reconstructed map is given by 
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ELi x ( V - d (%k, x ) J x fc+ i 
fL(xo) = , 4.2 



where A is a threshold-linear weight function, 



X(x) = x 9{x) 4.3 

for a local linear reconstruction (9 is the step function). 

Thus, the dynamical model (4.2) is a weighted average of xjt+i = f(xfc) 
over the points Xk in an ry-ball centered at xo, weighted by r\ — d(xk, xq). 

For low values of 77, this model functions as an associative memory to 
recall a temporal sequence of patterns, with a storage capacity that can be 
computed analytically using statistical methods developed for neural networks 
(Zertuche et al., 1994a,b). Here, we are interested in supercritical values of 77, 
where the learned sequence is unstable and the model acquires positive Liapunov 
exponents, like the physical system being modeled. 

The prediction model has two parameters which must be determined by 
optimization: the radius 77, and the number of time steps which determine a 
point in phase space: T e . 

Based on the previous results on mutual information estimates and the 
application of Grassberger and Procaccia's algorithm, one would expect that 
T e should be sufficient to give a proper embedding of the attractor, T e > 7, 
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and r) should be such that a typical initial point has a sufficient number of 
neighbours with distance less than rj for a meaningful linear interpolation. For 
the given embedding dimension, the number of points should be N(rj) xs 500; 
which would require that rj ~ 0.1. Unfortunately, this is not far below the 
saturation point (N < 4000), so one cannot expect a very fine reconstruction 
of the dynamics: the true dynamical map over such a large ?y-ball is not likely 
to be approximately linear, and therefore f^(a;o) ^ f(xo). 

The optimization was performed by removing the last year of data and 
using it to extract a statistical sampling of initial test-points xq. The skill for 
one-day lead forecasts was found to be optimal for T = 14, r\ = 0.1, where 

skill = 1- << di(fL(x ),x fc+ i) >> . 4.4 

1 N 

di(xk,xi) = jjJ2\ x k ) -4 ) \ ■ 4.5 
i=i 

The skill of the optimal model (T = 14, rj = 0.1) is 

Max(skill) = 0.919 4.6 

which compares to 0.904 for persistence, 0.905 for the seasonal average and 0.865 
for random pattern selection. More details on the results of the prediction code 
are being published separately (Waelbroeck et al. 1994). 
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At optimum, we also measured the skill in the prediction of the anomaly of 
rainfall accumulations over a 10-day period, a variable chosen for its relevance in 
agriculture: the skill was equal to 64% of the variance of the observed anomaly 
over the 11-year span of measurements, where the anomaly of a variable at a 
given date is defined as the difference between the value of the variable and its 
seasonal average, evaluated over all available data within 10 days of this date, 
for each of the 11 years of data (an average over 231 samples). 

The 10-day rainfall predictions for all of 1992 (the last year in the data 
set) are compared to the seasonal average and to the observed precipitations 
in Figure 10. Although the predictions clearly do not follow the observations 
very closely, the unusual early beginning of the rain season was predicted by 
the model. 
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5. Summary 



In summary, we have shown that the effective dynamics of a tropical 
weather system, at a scale corresponding to 3.5—8% of the range of the variables, 
reflects an underlying deterministic dynamics with an attractor of dimension 
D a 11.7. The local linear structure of the attractor indicates a theoretical 
predictability of about 2 days for linear reconstructions of the dynamics. 

A sharp increase in the effective dimension at smaller scales, R/R ma x < 
4%, may indicate that external perturbations of the local system become signif- 
icant, in accordance with Lorenz' conjecture that the climate system consists of 
a number of weakly coupled subsystems whose attractors can be detected with 
the Grassberger-Procaccia algorithm. 

A simple reconstruction of the dynamics was proposed, which showed some 
skill in 10-day predictions: this long-term predictability appears to reflect the 
persistence of weather patterns, which would also cause the slow decline of the 
mutual information as a function of the delay. 

Altogether, three data analysis methods were applied and gave reasonably 
coherent results for the following characteristics of the system. 

1. Embedding dimension: The mutual information analysis gave T e = 14, the 
scaling graphs led to T e = 7 and the optimization of the local reconstruction 
model gave T e = 14. 
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2. Short-term predictability, limited by Ai: The mutual information analysis 
indicates that T pre < 1 day, the scaling graphs gave T pre < 2 days and the local 
reconstruction model, T pre < 1 day. 

3. Long term predictability, from the persistence of weather patterns: After a 
quick initial drop, the mutual information fades out more slowly with a time 
scale of 20 days. Also, the distance of nearby trajectories quickly reaches a 
maximum distance p ~ 0.1 after which it increases more slowly, with a time 
scale of about 20 days. Finally, the prediction model shows that useful skill can 
be extracted for 10-day predictions. 

The general coherence of these results lends some confidence that they 
reflect genuine physical properties of the dynamical system. There appears to 
be three effective dynamics, at three different scales. 

1. At very fine scales, one has a very complex system involving the local system 
and its couplings to the external variables, with an effective dimension of the 
attractor D a > 35. 

2. At the scale 0.04 < p < 0.08, the local weather system dominates, with an 
effective attractor dimension D a 11.7, a Liapunov exponent Ai > |/n(2) and 
a predictability limited to a couple of days. 

3. At the scale p w 0.1, one has general features of weather patterns which 
follow a much softer dynamical evolution, characterised by long predictablity 
times, of the order of 20 days. 
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The situation is similar to that encountered in linear regression forecasts, 
where a short-term predictability due to a first-order Markov process is followed 
by a longer "tail of autocorrelation" which allows prediction of monthly means 
well beyond what can be expected from the first-order process (Gutzler and Mo 
1983, Horel 1985, Trenberth 1985, Van den Dool, Klein and Walsh 1986, Roads 
and Barnett 1984, Strauss and Halem 1981)*. Of course in this article the short 
term predictability is modeled by a non-linear rule based on 14 consecutive days 
of data, but a similar "tail of mutual information" is likely to be responsible for 
the quality of the predictions at medium-long range, as anticipated in Section 
2. 

The external perturbations only affect the system progressively, reaching 
the amplitude p = 0.05 after 14 days; this indicates that the short-term pre- 
dictability could be improved and extended several days, if one could improve 
on the resolution with which the local dynamics is reconstructed: this would re- 
quire improving the use of the available local data to better define the structure 
of the attr actor near the scale R/R max « 4%. 

Beyond that point, any further improvement in prediction skill would re- 
quire introducing the large scale external perturbations from general circulation 
models. This last task requires a methodology for determining what aspects of 
large-scale weather patterns are most relevant to the local system. We hope 
that the preliminary work which we have described in this article has laid the 

* We thank the referee for this comment 
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first stones towards a new approach to small-scale meteorology, which is custom- 
designed to predict local, chaotic phenomena. 
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FIGURE CAPTIONS 

Table 1 The 19 variables observed each day are given, as well as the number 
of relevant bits indicating the measurement accuracy, and the information cal- 
culated by partitionning the range of each variable into 64 subintervals. The 
maximum and minimum value of each variable prior to normalization is also 
given. Atmospheric pressure is corrected for elevation. 

Table 2 The mutual information of each pair of observables at the same time 
step is given; again the interval [0,1] is divided into 64 subintervals, so that 
the maximum possible information is equal to 6 bits. Note that the diagonal 
terms represent the information of each variable individually. The sum of all the 
terms on the diagonal gives the information needed to specify the 19 variables 
independently, while the sum of all terms below the diagonal gives the total 
amount of mutual information among them. The difference between these two 
quantities is equal to the information needed to specify the 19 variables together. 

Table 3 The mutual information of each variable is given as a function of the 
delay, as well as the total over all variables. Some variables have a relatively 
large persistence even for delays of 14 days, while for other variables the mutual 
information drops very quickly with the delay. The total mutual information 
at a delay t = 1 day is less than lOtotal information in one day of data, so one 
can consider that consecutive days provide reasonably independent coordinates 
for a phase space reconstruction. 

Figure 1 The additional information needed to specify the variables at day T+l, 

38 



knowing these variables at days number 1, 2, T, is given as a function of 
T. No new information is gained by extending the phase space vector beyond 
14 consecutive days of data. This indicates that the dynamics is deterministic 
with an embedding dimension corresponding to Te E 14. 

Figure 2 The dimension D(T) of the attractor's projection on a 19T-dimensional 
subspace of the embedding space is given, as a function of T. A plateau is clearly 
reached at T=7, indicating an effective dimension of the attractor Da E 11.7. 

Figure 3 The scaling graph gives the number of pairs of data points in phase 
space with distance less than r, as a function of r, if we use a slightly redundant 
embedding with 10 consecutive days of data. One distinguishes four regions 
at increasing values of r: at the lowest values there appears to be a tendency 
for the slope to increase, but this occurs at values of N too low to trust the 
statistics. Next comes the scaling region, with a slope equal to Da E 11.67. 
One then enters a non-linear region where the slope increases at first, to about 
14, before it begins to decrease as a consequence of the saturation, when r 
becomes comparable to the size of the attractor. 

Figure 4 The scaling graph for T = 14 consecutive days of data displays the 
same features as for T = 10, although the scaling region with a slope equal to 
Da E 11.68 is narrower. The increase in the slope at low values of r is more 
significant. This may reflect a higher effecive dimension of the system coupled 
to external variables, as suggested by Lorenz' work; a time of 14 days would 
then appear to be sufficient to affect the local system up to the statistically 
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significant scale rE 0.055. 

Figure 5 The number of pairs within a ball of radius r, N(r), scales like a power 
of r. The bilogarithmic plot gives a straight line which can be determined by 
linear regression. The scaling graphs are represented for four different values of 
the embedding dimension, correslonding to T = 2, 4, 7 and 10. The projected 
dimension D(T) of the attractor on the chosen subspace of the true embedding 
space is given by the slope of the graph. The slope does not increase from T 
= 7 to T = 10, indicating that 7 consecutive days of data suffice to define the 
state of the system (a point in phase space). 

Figure 6 The scaling graph for T = 29 consecutive days of data shows very 
clearly an increase in the slope at low values of r, up to r E 0.065 where there 
is no question that the effect observed is statistically significant, since the total 
number of pairs observed is then equal to 401560.05 E 200. The higher slope 
Deff E 36 shows that the correlation dimension of the coupled system is at least 
this large. 

Figure 7 The distance between two initially nearby points as they evolve in the 
pattern set is given as a function of the evolution time. After a rapid rise which 
reflects the short term chaotic dynamics, this distance follows a slowly rising 
slope towards the upper limit which corresponds to the distance between two 
randomly chosen patterns. 

Figure 8 An estimate of the short -term (chaotic) predictability is given by the 
graph of the number of pairs found in a ball of fixed radius as a function of 
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the projected embedding dimension, d=19T. This number decreases exponen- 
tially with the dimension d, and the rate of exponenial decline is a measure of 
predictability. 

Figure 9 Another estimate of the short-term predictability is given by the graph 
of the projected attractor dimension versus the projected embedding space di- 
mension, D(T), in the linear region. This measure gives the result Tpre E 2.1, 
comparable to that of Figure 8. 

Figure 10 The actual precipitation curve for 1992, smoothed by 10-day averaging 
(solid line) is compared to the normal precipitation curve, defined as an average 
over 10 years of data for dates within 10 days of the current date (thick dotted 
line) and the predictions of the model for 10-day cumulative precipitation (fine 
dotted line). The shift in the y-axis reflects the annual average which was 
removed. Note that 1992 was an unusual year in that the rain season begun 
almost two months ahead of schedule. This sort of phenomenon is precisely what 
a model should be able to predict if it is to be useful in practical applications, 
such as agriculture. The predictions slightly exagerate the out-of-season winter 
rains (first bump at the left), but predict accurately the beginning of the rain 
season at day N! 60. The other unusual phenomenon, a dry spell followed by a 
recovery of rains late in the rain season, is predicted, but with a delay of about 
10 days. 
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